

MARSHALL GRANT  
IN-60-CR

319234

P13

Exploration  
of  
Operator Method Digital Optical Computers  
For  
Application To NASA

Final Report  
UAH SUB89-116  
NAG3-777

(NASA-CR-187663) EXPLORATION OF OPERATOR  
METHOD DIGITAL OPTICAL COMPUTERS FOR  
APPLICATION TO NASA Final Report (Alabama  
Univ.) 13 p

N91-15695

CSCL 09B

Unclass

03/60 0319234

Final Report  
Operator Method Digital Optical Computers

**1.0 INTRODUCTION:**

*DISCUSSION OF ULTIMATE PERFORMANCE LIMITS OF DIGITAL OPTICAL COMPUTERS*

Digital optical computer design has been focused primarily towards "parallel" implementation as shown in Figure 1. As shown, these typical machines have two planes of inputs and one output plane. Input planes for the "A" and "B" inputs have been implemented with various forms of spatial light modulators. Multichannel acousto-optic devices are used primarily due to the device performance and availability. An example can be found in reference 2.

We refer to parallel in the strict sense of single point-to-point interconnection as shown in the figure. This type of architecture is the simplest to implement in hardware due to the ability of lenses to simply image points on an input plane to points on a second image and again simply image this binary product to an output detection plane.

In terms of expected performance, Figure 2 compares this type of architecture to currently developing VHSIC systems. Using demonstrated multichannel acousto optic devices, a figure of merit can be formulated. Here we focus on a figure of merit termed "Gate Interconnect Bandwidth Product" or GIBP. This is equivalently the number of two input gates connected together times their utilization per second. As can be seen in figure 2, for the multichannel acousto optic device, the number of effective gates is calculated to be 16,384 or simply the total interconnect of two  $32 \times 512$  optic device, the number of effective gates is calculated to be 16,384 or simply the total interconnect of two  $32 \times 512$



Figure 1: Conventional "parallel" implementation of optical interconnects for digital optical computing.

elements in each plane. The 32 comes from the number of channels and the 512 from the time-bandwidth product or number of resolution elements in each channel. Since these devices can be clocked at 100 MHz (or 10ns effective gate time) then the total GIBP is calculated as  $1.6 \times 10^{12}$ . We feel this represents a true measure of speed. VHSIC chips today may exhibit in excess of  $10^5$  gates/chip with clock speed approaching 10 ns ( $10^8$  Hz). Thus one can achieve VHSIC performance at  $10^{13}$  GIBP. Once again algorithmic efficiency effects the total performance but from the simple GIBP comparison, one can see that parallel optical implementations of digital computers barely, if at all, competes with semiconductor VHSIC devices with respect to GIBP.

| DIGITAL TECHNOLOGY |                                                                                                           |                                                           |
|--------------------|-----------------------------------------------------------------------------------------------------------|-----------------------------------------------------------|
|                    | Parallel Optics:                                                                                          | VHSIC:                                                    |
| Today:             | 32x512 A.O. devices<br>$16,384$ gates ( $128^2$ )<br>$10^8$ Hz (10ns clock)<br>$1.6 \times 10^{12}$ GIBP* | $10^5$ gates<br>$10^8$ Hz (10ns clock)<br>$10^{13}$ GIBP* |
| Tomorrow:          | 1000x1000 SLM<br>$10^6$ gates<br>$10^9$ Hz (1ns clock)<br>$10^{15}$ GIBP*                                 | $10^6$ gates<br>$10^8$ Hz (10ns clock)<br>$10^{14}$ GIBP* |

Figure 2: Performance comparison illustration of Parallel Optical implementation vs. VHSIC.

\*Gate Interconnect Bandwidth Product

Conventional thinking in the optical computing community has been to improve the input spatial light modulators (SLMs). A great deal of work in this area includes the work at 1.) U. Colorado in the area of Ferroelectric Liquid Crystals<sup>[10]</sup>, 2.) AT&T Bell Labs in the area of quantum well SLMS<sup>[11]</sup>, 3.) Texas Instruments in the area of membrane light modulators<sup>[12]</sup>, and many others. In all cases, the objectives are to produce devices which will ultimately allow 1000x1000 pixel performance. Given that this someday is accomplished at equivalent clock speeds and greater, very optimistically view 1 ns, then the ultimate limit of parallel optical digital computing systems can only reach a GIBP of  $10^{15}$  per computer. 100 to 1000 VHSIC chips are required today to achieve the same computational complexity.

It is therefore our opinion that conventional parallel optical digital computer architecture demonstrates only marginal competitiveness at best when compared to projected semiconductor implementations.

## 2.0 THE OPPORTUNITY OF GLOBAL INTERCONNECTS

Optical computers however are not limited to "parallel" interconnects. As shown in figure 3, every point at the first input plane can be connected to every point in the second input plane which can be subsequently connected to every point in the output plane.



Figure 3: Full global implementation of optical interconnects for digital optical computing.

This type of configuration is referred to as a "full global" interconnect.

Clearly several advantages can be seen. Global optical interconnects can cross optical paths and no cross talk will be observed. This type of interconnect is clearly extremely difficult with semiconductor technology due to inductive and capacitive cross talk problems especially at high clock rates. Another advantage is the ability to achieve extremely high fan-in on the detectors. There are no capacitive loading effects as seen in semiconductor technology. Extremely large fan-in's are projected for optics (>1000:1), where as in semiconductor technology greater than 10 is difficult. Consequently, global optical technology appears to be well suited for "wide word" processing. Thus the tradeoff leans towards larger multi-input gates and fewer gate delays.

The largest advantage to global interconnect is the large improvement potential in gate interconnect bandwidth as can be seen in figure 4. Even with today's available and mature spatial light modulators like the one described earlier, i.e. a 32 channel acousto optic device with a time bandwith product per channel of 512, at a 10ns clock rate the resultant GIBP that can be achieved will approach  $2.7 \times 10^{16}$ ! This, when compared to current VHSIC technology, represents over 3 orders of magnitude improvement over a dense VHSIC chip configured at 100 MHZ. Another way of expressing

| DIGITAL TECHNOLOGY |                                                                                                                 |                                                              |
|--------------------|-----------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------|
|                    | Global Optics:                                                                                                  | VHSIC:                                                       |
| Today:             | 32x512 A.O. devices<br>268,435,456 gates ( $128^2$ )<br>$10^{-8}$ Hz (10ns clock)<br>$2.7 \times 10^{16}$ GIBP* | $10^5$ gates<br>$10^{-8}$ Hz (10ns clock)<br>$10^{13}$ GIBP* |
| Tomorrow:          | 1000x1000 SLM<br>$10^{12}$ gates ( $1000^2$ )<br>$10^{-9}$ Hz (1ns clock)<br>$10^{21}$ GIBP*                    | $10^6$ gates<br>$10^{-8}$ Hz (10ns clock)<br>$10^{14}$ GIBP* |

\*Gate Interconnect Bandwidth Product

Figure 4: Performance comparison illustration of "full global" interconnect implementation vs. VHSIC

this improvement is to consider the optical system to be equivalent to 2700 VHSIC chips.

If 1000x1000 element spatial light modulators are indeed ever developed that operate at 1 ns, the GIBP potential of digital optical computers could ultimately approach  $10^{21}$  or 7 orders of magnitude improvement potential.

### 3.0 THERMAL AND OTHER LIMITS

Although the utilization of global interconnects clearly shows great potential in terms of projected throughput/compute capability, optical computing systems offer in addition the potential of extremely low power dissipation as compared to semiconductor technology.

By using current optical technology, i.e. acoustooptic devices and avalanche photodiode arrays, photon budgets per event can approach theoretical limits. For example a 1000 photon threshold represents  $6 \times 10^4 kT$  at 300°K thereby approaching within a factor of 60kT per photon per event. Current semiconductor technology requires at best 2 orders of magnitude and on average 4 orders of magnitude and at most 6 orders of magnitude more power per bit as can be seen on figure 5 compiled from references 13-17.

Consider 1,000 photons per event. To achieve the "theoretical" limit GIBP of  $10^{21}$  significant optical power is required. Specifically,  $10^{21}$  GIBP multiplied by 1,000 photons per event yields  $10^{24}$  photons per second. A 1 watt, .81 μm source will deliver  $4.075 \times 10^{18}$  photons per second. Therefore to achieve  $10^{24}$  photons per second without consideration for losses in the system such as diffraction efficiency of the acousto optic devices, detector responsivity and various other losses, a total power budget of  $10^{24} + 4.075 \times 10^{18} = 245,398$  watts of power! The conclusion here is that we



ORIGINAL PAGE IS  
OF POOR QUALITY

do not feel that we will ever be able to fully exploit all the power in optical interconnects, i.e. ever obtain a GIBP of  $10^{21}$ !

So what is a reasonable performance projection? The data from figure 5 can be plotted as shown on figure 6 titled GIBP versus power. Clearly the most competitive technology is that of GaAs. The GaAs technology boundary as shown in the figure allows GaAs to have the maximum allowable leverage. The line is drawn with the assumption that standard gate propagation delays of 100 ps can be used as the clock value, an assumption requiring a 40 GHz bandwidth @ RTZ format. As shown on the graph it may be possible to achieve approximately  $10^{16}$  GIBP with a power consumption of 2 to 3 KWatts.

Notice that the optical device curve at 100 % efficiency is at least 3 orders of magnitude better. Our current prototype, the DOC -1 (digital optical computer) is designed to operate at a GIBP of approximately  $10^{12}$  and is shown accordingly on the graph. For the moment ignore the of the Bragg cell power consumption (approximately 32 watts) and the detector transimpedance amplifier / threshold circuitry (another 64 watts). Looking only at the photon budget requirement using  $\text{TeO}_2$ , typical diffraction efficiencies (here a multiplicative efficiency of .32% is assumed), then the power consumption of 50 mw is *already superior to GaAs technology*. In addition, the substitution of GaP Bragg cells which

decrease the inefficiency to 12 % shows an optical power consumption on the order of 1 mW.

Unfortunately, one cannot ignore the drive and detection electronics. So let us go back to the above question of what is a reasonable projection. It appears from the graph for a digital optical computer to be clearly competitive it must have at a minimum the following specifications: GIBP >  $10^{15}$ , gate efficiency > 1%, and a drive/detector power consumption of less than 100W.



Figure 6: GIBP vs. Power consumption.

ORIGINAL PAGE IS  
OF POOR QUALITY

#### 4.0 Analog Global:

The use of global interconnects in analog optical computing is not new to the field. For example, as early as 1964,

A.B. Vander Lugt invented the optical correlator as shown in figure 7. [ref. 18]

The second lens produces the Fourier transform of the input at the matched filter plane. The operation of Fourier transformation is in and of itself a global interconnect operation in two dimensions. For example if the input is a point source, the distribution in the Fourier plane is a plane wave. Thus the system globally broadcasts the light from the point source to all points in the Fourier plane. Consequently, if the input is considered as an array of point sources, one can clearly see how this system performs a "full global" interconnect.

After multiplication with the matched filter the third lens again produces a Fourier transform. This time the Fourier transform of the product of the Fourier transform of the input times the matched filter is produced at the output. This is commonly referred to as the correlation function. Clearly, this system can never be beat with digital electronics because full global interconnects are used. The question is how to utilize this "correlator" type architecture in the digital regime efficiently.



Figure 7: The analog Vander Lugt optical correlator utilizes full global interconnects.

#### 5.0 Quasi-digital:

Figure 8 shows a planar global interconnect between two linear spatial light modulators and the output plane.

If two digital words are placed respectively at the two input planes an interesting phenomenon occurs.

For example, in the figure 7 three-bit words  $A(a^1, a^2, a^3)$  and  $B(b^1, b^2, b^3)$  are placed at the two input



Figure 8: Flash Digital Multiplication by Analog Convolution (DMAC) by utilizing full global interconnects.

planes as shown. Notice that, with full global interconnects, five equations are produced as follows:

$$a^3b^3 = c^5$$

$$a^2b^3 + a^3b^2 = c^4$$

$$a^1b^3 + a^2b^2 + a^3b^1 = c^3$$

$$a^1b^1 + a^2b^2 = c^2$$

$$a^1b^1 = c^1$$

Notice that these five equations produce the same exact answer as the DMAC algorithm, (digital multiplication by analog convolution) as shown in figure 8. We do not propose to pursue this path. However, what is important is that "full global" interconnects produces the convolution of the two vector inputs. And it produces this full convolution in one clock cycle.



Figure 9: Carryless digital multiplication by analog convolution algorithm

#### 6.0 Full Digital:

Now the question becomes, what happens if instead of using the detectors as summing nodes as in the quasi-digital case above, we use the detectors as Boolean summing devices, i.e. a thresholding device or an "or" gate. Another way of stating this question is what digital primitives are represented by digital convolution with a digital threshold?

In figure 10, the outputs are all placed onto a single detector. The detector is used as an "ORing" device which produces either a one or zero to the output gate which subsequently inverts the result.

Mathematically, the output can now be written as:

$$O = \left[ \begin{array}{c} a_2 b_0 + \\ a_1 b_0 + a_2 b_1 + \\ a_0 b_0 + a_1 b_1 + a_2 b_2 + \\ a_0 b_1 + a_1 b_2 + \\ a_0 b_2 \end{array} \right]$$

Figure 10: Full Global full digital with single Boolean sum detector



This can be expressed, after algebraic grouping as:

$$= b_0(a_0 + a_1 + a_2) + b_1(a_0 + a_1 + a_2) + b_2(a_0 + a_1 + a_2)$$

This subsequently becomes:

$$= (b_0 + b_1 + b_2)(a_0 + a_1 + a_2)$$

The critical key to understanding the significance to the expression comes by applying DeMorgan's Law. DeMorgan's law states:

$$X + Y = \overline{XY}$$

Consequently after application of DeMorgan's Law the output becomes:

$$= \overline{\overline{b_0} \overline{b_1} \overline{b_2}} + \overline{\overline{a_0} \overline{a_1} \overline{a_2}}$$

After output inversion by the output gate the final result can be written as:

$$= \overline{b_0} \overline{b_1} \overline{b_2} + \overline{a_0} \overline{a_1} \overline{a_2}$$

If the inputs are driven with the inversions of the bits instead of the bits themselves then the output can be written :

$$= b_0 b_1 b_2 + a_0 a_1 a_2$$

Consequently, full global interconnect effectively produces the digital logic primitive of two N-bit wide AND gates followed by the OR-invert operation as shown in figure 11.

If more than two SLMs are cascaded the number of N input AND gates feeding the OR gate grows as the number of SLMs. As can be seen from figure 11, the global interconnect primitive is similar to the parallel interconnect primitive as described in reference 4 with the difference that the global interconnect primitive is far more powerful. The parallel interconnect primitive is essentially an array of 2 input AND gates followed by a multiple input OR gate. Here we have

multiple input AND gate capability followed by 2 input OR (or more if more devices are cascaded). Effectively we are graduating from the arbitrary selection of minterm functionals to the arbitrary selection of the sum of minterm functionals.



Figure 11: Full Global digital optical primitive for 2 level SLM cascade

#### 7.0 Conclusion:

Digital optical computing is becoming a very tough competitor to semiconductor technology since it can support a very high degree of three dimensional interconnect density and high degrees of Fan-In without capacitive loading effects at very low power consumption levels.

#### 8.0 REFERENCES:

- [1] P. S. Guilfoyle, F.F. Zeise, "Reconfigurable Programmable Optical Digital Computer", Proceedings of the 1989 Third Topical Meeting on Optical Computing, Salt Lake City, Optical Society of America, Feb. 1989.
- [2] P.S. Guilfoyle, W.J. Wiley, "Combinatorial logic based digital optical computing", Applied Optics, vol. 27, Number 9, May 1, 1988.
- [3] P.S. Guilfoyle, W.J. Wiley, "Digital Optical Linear 3 x 3 Bit Combinatorial Systolic Multiplication Array," PROCEEDINGS of the SPIE, Real Time Signal Processing IX, Vol. 698-30, August 1986.
- [4] P.S. Guilfoyle, "Systolic Acousto-Optic Binary Convolver," Optical Engineering, vol. 23, Number 1, pg. 20-25, Jan./Feb. 1984.
- [5] Z. Kohavi, **Switching and Finite Automata Theory**, Second Edition, McGrawHill, ISBN 0-07-035310-7, New York, 1978, pg. 53.
- [6] C.E.Shannon, "A Symbolic Analysis of Relay and Switching Circuits", AIEE Transactions, Vol. 57 1938 pp 714 - 723.

- [7] V.N. Morozov, A.A. Elion, *Optoelectronic Switching Systems in Telecommunications and Computers*, Marcel Dekker, Inc., ISBN 0-8247-7163-X, New York, 1984, Pg. 176.
- [8] J. Millman, C. Halkias, *Integrated Electronics: Analog and Digital Circuits and Systems*, McGraw-Hill, New York, ISBN 07-042315-6, 1972, Pgs. 173-175.
- [9] E. Swartzlander, *Computer Arithmetic*, Dowden, Hutchinson and Ross, 1980.
- [10] R. A. Rice, W. Li, G. Moddel, "High-Speed Optically Addressed Spatial Light Modulator for Optical Computing," *Optical Computing 1989 Technical Digest Series*, Vol. 9, (Optical Society of America, Wash. D.C.) pg.64.
- [11] D. Miller, "Quantum Well Devices for Optical Computing and Switching," *Optical Computing 1989 Technical Digest Series*, Vol. 9, (Optical Society of America, Wash. D.C.)pg.413.
- [12] D. Collins, J. Sampsell, J. Florence, P. Penz, M. Gately, "Deformable Mirror Device Spatial Light Modulator and its Use in Neural Networks," *Spatial Light Modulators and Applications 1988 Technical Digest Series*, Vol. 8, (Optical Society of America, Wash., D.C. ) pg. 102
- [13] C. Kyriakakis, P. Asthana, R.V. Johnson, and A.R. Tanguay, Jr., "Spatial Light Modulators: Fundamental and Technological Issues," *1988 Technical Digest Series*, Vol. 8, *Spatial Light Modulators and Applications*, Optical Society of America, June 15-17, 1988.
- [14] ECL 10K/100K Data Manual, 1986, Signetics Corporation, 'Principal Characteristics of Logic Families', page v, Table 1
- [15] ALS/AS Logic Databook, 1987, National Semiconductor Corporation, 'Family Comparison', page 1-36, Table 1
- [16] MECL Device Data, 1987, Motorola Incorporated, 'MECL Family Comparisons', page 1-2, Figure 1a
- [17] Bipolar Integrated Technology-promotional brochure, 'Gate Delay versus Power Dissipation/Gate', graph.
- [18] "Optical Digital Computer Design Specification -- Phase I Final Technical Report," OptiComp Corporation, contract N00014-87-C-0077, September 1988.
- [19]. A. B. Vander Lugt, "Signal Detection by Complex Spatial Filtering," *IEEE Trans. Inform. Theory*, IT-10:2, 1964.